The goal of this paper is to make insights, predictions, and counsel as to the relationship and correlation between skills and wage in professional soccer. We search for players who have been over or undervalued, as well as teams who over or undervalue their players. We also examine how different leagues value different skills . Our first dataset comprises the complete ratings for every soccer player in EA Sports’ FIFA 18 video game. This is the most consistent and comprehensive way to compare players available to us. FIFA puts a lot of study into their ratings. Michael Mueller-Moehring, responsible for the FIFA ratings, states that over 9,000 data reviewers, comprised of coaches, professional scouts, and season ticket owners, help assign skill ratings. Also important to us in this dataset was the players’ weekly wage in euros. Positional spending data and updated Premier League table statistics were obtained using Rvest, in combination with simple scripts, from www.spotrac.com and www.bbc.co.uk.
We made 4 linear models to try to predict wage as a function of a given skill, using the lm() function in R. We filtered the data by league to compare the relationship between various skills and wage in each league. We decided to focus on the top 5 leagues in European professional soccer, the top league from five different countries, which we call the power 5. They are the English “Premier League”, the Spanish “La Liga”, the German “Bundesliga”, the French “Ligue 1”, and the Italian “Serie A”. Looking at how teams pay their players as certain skills increase or decrease allows us to not only draw insights about the players and the teams but about the individual leagues themselves and how they value skills.
It is important to note “Wage vs. League” below shows some interesting insights that should inform analysis of our findings. Before making any complete conclusions about how much a league values any given skill based on the money they allot, one should note how much a league pays in general. This boxplot shows the median wage for players in each league, as well as dots representing outliers. While it may not be inappropriate to make inferences with outliers (aka superstars who make hundreds of thousands of euros more than the rest of their team) in the dataset, this is important to keep in mind when looking at the far right end of the model graphs, aka the best players at a given skill.
The first model related dribbling skill and Wage. For this skill a cubic regression model was appropriate, as its R^2 value was higher than both a linear model and a quadratic model. With our alpha level of .05, all the leagues had significant p-values showing a relationship between Dribbling skill and wage. Dribbling is an important statistic in soccer, as attackers can often use it to move up the field and create scoring opportunities. We test the difference in how the different leagues value players based on dribbling skill.
We have included a plotly graph for all models to allow for closer examination by the reader if desired.
The second model related Physicality and Wage. Physicality is a meta-statistic that we created for the purposes of this model. We used in this model three different skills that we saw as indicative of the physicality of a player. These were strength, acceleration, and stamina. In addition to being a genuinely important and useful quality for players to have, this gave us a chance to see what our models would look like with multiple statistics in them. All the models for this skill were statistically significant predictors of wage. However, Stamina itself was not a significant indicator for La Liga, Bundesliga or Serie A.
Our third model related Finishing skill and Wage. Finishing is a player’s ability to finish a drive and score. A cubic model was appropriate here. All models were significant and all indicators were significant.
For our final model we created another meta-statistic that we call tacticality. This is a combination of the skills positioning, and vision. A cubic model also was appropriate here. All models were significant.
We now examine a model that predicts wage based on overall skill, in the English Premier League. Using a quadratic model, we find overall skill is a statistically significant predictor of overall skill. We overlay the model with a scatter plot of players, using plotly to allow examination of individual players. This reveals and allows the user to identify players that are over or under valued, per our model.
We now filter the same graph to display only players on Manchester United. We see many overvalued players per our model.
Filtering to display only players on Tottenham Hotspur, we see a chronic undervaluing of players per our model.
We rank teams based on how much they have spent overall, on attackers, and on defenders, as well as where they finished on the table (points, with ties broken by goals scored), how many goals they scored, and how many they conceded. We compare these rankings, using the Library DT to create a visualization
The following table compares teams rankings in overall spending, and final table position. The more red a team is, the more they underperform their overall spending. the more blue, the more they overperform.
The following table compares teams rankings in attacking spending, and goals scored. The more red a team is, the more they underperform their attacking spending. The more blue, the more they overperform.
The following table compares teams rankings in defensive spending, and goals conceded.The more red a team is, the more they underperform their defensive spending. The more blue, the more they overperform.
If we compare the accuracy of all four of our models, we find that tacticality has the highest R^2 value, meaning it can explain the highest proportion of variability in wage. This tells us that our tacticality meta-statistic is the best predictor of wage, and that teams in the power 5 are keenly aware of these types of skills and pay players accordingly. It is important to note, however, that the accuracy of this predictor is perhaps because it included two different skills, Positioning and Vision, so the model is made better because of these multiple predictor variables included. However, interestingly enough, the lowest r^2 in our models was present in the physicality model. Physicality was also a meta-statistic. The fact that one meta-statistic has the lowest R^2 and another one has the highest means that we are not just seeing a higher correlation between tacticality and wage because there are two predictor skills (as opposed to just one, as is the case with the dribbling and finishing models), but because teams actually place value on players according to this statistic. What we find here is that teams do not place as much value on physical skills, such as acceleration, strength, or stamina, and that players with higher ratings in these skill are often not given commensurate compensation, and that players with lower ratings in these skills can sometimes secure large salaries. Conversely, we find that teams do place value on players with higher tactical skills like positioning and vision, and players with high ratings in these skills are most often highly compensated.
As far as dribbling vs. wage, we found that our model predicts La Liga to pay its players very little until they reach a rating of 80 in dribbling. This is in line with the idea that La Liga pays exorbitant sums to superstars like Cristiano Ronaldo and Lionel Messi and doesn’t have much room on the payroll for players with lower skills. Premier League has more money to pay its players a consistently high sum, but is surpassed by La Liga when it comes to paying superstars. These outliers often find themselves in La Liga, though a player with solid dribbling (rated 60-87) might find a better deal in the Premier League until they hit their peak.
For physicality, our models predict that the Premier League values this skill more than any other league for all ratings above 60. La Liga is a close second, with the other three leagues trailing behind in succession.
In our finishing vs. wage model we can see that La Liga’s predicted wage actually falls for players with finishing above 35 and then shoots up in the 60’s and above. This could show that La Liga does not have room on its payroll for anyone with subpar or mediocre finishing. The predicted wage again passes that of the Premier League, this time at 83 finishing. La Liga thus has little to offer average finishers. We can also see Bundesliga place an increased value on finishers of 80 and above, as they surpass Serie A at this rating.
We can see the same trends in our model for tacticality. La Liga places the smallest value on mediocre players in this skill and then shoots up once players reach around the 70’s, again overtaking the Premier League around 87 skill. The Premier League is still the best place to be for a low to average tactical player. France’s Ligue 1 places the most value we have seen thus far on tactical players. Elite tactical players in the high-80’s ratings can expect to be paid more in Ligue 1 than in Serie A.
We observed a chronic undervaluing of players on Tottenham Hotspur. Tottenham has, for the past few seasons, consistently finishing high in the table, and playing european football. However, they do not have the finances of the typical “Big Clubs,” and are currently building a new stadium. This new stadium should increase revenue, but is a large expenditure. This contributes to their inability to pay their players what our model predicts they deserve. Tottenham must try to retain their top level squad, and prevent them from being enticed by large wages at their competitors.
The under/over performance tables reveal most notably underperformance by Arsenal, with a 6th place finish, and Swansea and Stoke, traditionally larger clubs who will be relegated this season. We also see the already discussed overperformance by Tottenham.